GPL Tools/match.py
This is the script for matching functions and data addresses between a bunch of IDC databases. Usage will be uploaded soon Dependencies * Python (I use 2.6 under Linux). Just to be sure, install numpy/scipy and ipython. * arm-elf-gcc in your PATH (see Build instructions/550D for how to do that) * It doesn't require IDAPython nor IDA, just the IDC files. Preparing input files Prepare a working directory where you will put the input files. You will need: * Some dumps, with the .bin extension. Include the load address in the dump name. * Some IDC files. Try to give them names somewhat similar to the dumps, to help the autodetection. * The script (called for now match.py), in the same folder (or in PATH, if you like) For example, those names are valid: 5D_204_06_0xff810000.bin 550D_108_05_0xff010000.bin 500D_0xff010000.bin 5D 204 AJROM0.idc 550D_108_20101116_indy_ROM0.idc Running Then just say: python match.py and you should get something like this: Input files: Binary dump (*.bin) LoadAddr IDC database (*.idc) 5D_204_06_0xff810000.bin FF810000 5D 204 AJROM0.idc 500D_0xff010000.bin FF010000 n/a 550D_108_05_0xff010000.bin FF010000 550D_108_20101116_indy_ROM0.idc Disassembling 5D_204_06_0xff810000.bin ... ok Disassembling 500D_0xff010000.bin ... ok Disassembling 550D_108_05_0xff010000.bin ... ok Parsing 5D 204 AJROM0.idc... found 40692 MakeName's and 19191 MakeFunction's Parsing 550D_108_20101116_indy_ROM0.idc... found 56768 MakeName's and 18053 MakeFunction's Parsing disassembly of 5D_204_06_0xff810000.bin... found 1263894 lines Parsing disassembly of 500D_0xff010000.bin... found 1171162 lines Parsing disassembly of 550D_108_05_0xff010000.bin... found 1395198 lines Creating codesigs for 5D_204_06_0xff810000.bin... Creating codesigs for 550D_108_05_0xff010000.bin... saving cache... ok Found 6623 raw code matches between 550D_108_05_0xff010000.bin and 5D_204_06_0xff810000.bin. Results To find the results, just sort the working directory by modification date. * match-log.txt: shows detailed info about the matching process, for each pair of functions. Advanced use and debugging Interactive console You can run it in IPython; after the script finishes, you can poke around and make various queries. $ ipython In 1: run match.py ... In 2: bins # what dumps we have loaded? Out2: '5D_204_06_0xff810000.bin' In 3: t2i, mk2 = bins # give a short name to each one In 4: hex(Dt2i.ROM0xff011e1c) # read from ROM; only multiples of 4 allowed here Out4: '0x73616b61' In 5: BYTE(t2i, 0xff011bde) # this is for any address; reads a single byte from ROM Out5: 143 In 6: GuessString? # how to get help for a function ... Definition: GuessString(ROM, a) ... In 7: GuessString(t2i, 0xff011e1c) # find a string starting from a known address Out7: 'akashimorino' Internals A dump is identified by its file name, used as index into the various dictionaries used. Global variables * bins: list of dumps (i.e. file names with .bin extension) * loadaddrs: dictionary of load addresses for each dump * idcs: dictionary of idc file names for each dump * D: dictionary containing lots of info about dumps: ROM contents, IDC names, functions, signatures... Functions * BYTE(bin, addr): read a byte from the ROM, from the dump whose file name is bin * GuessString(bin, addr): detect a string starting from addr * funcname(bin, addr): function name extracted from IDC, or sub_ABCD1234 if it's not found * getname(bin, addr): similar to funcname, but used for other names (not functions). ... Functions for interactive use: * find_data_ref(bin, value): look for references to a given value. In 1: find_data_ref(t2i,0x2b74) DebugMsg+112: ff067458: 2a000003 bcs ff06746c <_binary_550D_108_05_0xff010000_bin_start+0x5746c> ff06745c: e59f00f4 ldr r0, #244 ; ff067558 <_binary_550D_108_05_0xff010000_bin_start+0x57558> ff067460: e7901101 ldr r1, r1, lsl #2 pointer to 0x2b74 ... etc ...